Compsci 650 Applied Information Theory Lecture 4
نویسندگان
چکیده
Since the appearing probability of each English symbol P (a), ...P (b), ...P (” ”), ..., P (; ) is not uniform, we should be able to reduce the number of required bits. Based on the probability of each English symbol, we can compute the entropy H(E) ∼= 4.5 bits / char. If we use Huffman coding taught in this lecture to encode English keyboard, then we only need around 4.7 bits / char. Furthermore, the characters in English are not independent. For example, the number of meaningful English strings with length 8 should be much smaller than 96 and some strings occur much often than the others. In fact, we can get the entropy of such string H(E) ∼= 2.4 bits/char. If we further extend the length of string to infinity, the estimated entropy of English would be H(E∞) ∼= 1.3 bits / char. Using lossless data compression algorithms to encode English keyboard results in the following: LZW ∼= 3.7 bits / char GZip ∼= 2.7 bits / char BW ∼= 1.89 bits / char where LZW will be taught in the future lecture, and BW is an industrial standard, which achieves good performance by combining LZW, Gzip and several other compression algorithms. Nowadays, the cost of storage becomes much cheaper, but the internet bandwidth still remains an expensive resource. This is one of reasons why the compression techniques still play important roles in our life.
منابع مشابه
Compsci 650 Applied Information Theory 1.1 Identifying Whether a Coin Is Biased
1.1 Identifying Whether a Coin is Biased Lemma 1 We need O( 1 2 ) coin tosses to discern a biased coin with the probability of 1 2 − for head (and obviously with the probability of 12 + for tail), from an unbiased coin. Proof Consider two hypothesises H1 and H2 where respectively denote biased and unbiased coins. In other words, we have the following hypothesises: { H1 : biased p(h) = 12 − H2 :...
متن کاملThe Effects of Speech Training, Guidebook and Simultaneous Method, on the Knowledge and Attitude of Students about HIV/AIDS
Introduction: A great percentage of AIDS infections occure in adolescence and youth. since that,studing and finding the most efficient teaching methods in order to improve their information ( knowledge) and attitude also to create o positive one and to creat preventive measures in the young and adolescent,is of great significance and priority. Materials and Methods: 7 school were picked randomy...
متن کاملMulti - user information theory 2 Lecture 6
The broadcast channel is a communication channel in which there is one sender and several receivers, as presented in Fig. 1. The broadcast channel (BC) was first introduced by Cover [4].
متن کاملCoalgebras, Stone Duality, Modal Logic
A brief outline of the topics of the course could be as follows. Coalgebras generalise transition systems. Modal logics are the natural logics for coalgebras. Stone duality provides the relationship between coalgebras and modal logic. Furthermore, some basic category theory is needed to understand coalgebras as well as Stone duality. So we will need to learn something about modal logic, about S...
متن کاملSTA 563 : Lecture 9 – Rate Distortion Information Theory
9.1 Motivation: Quantization of Random Variables . . . . . . . . . . . . . . . . . . . . . 1 9.2 Lossy Source Coding Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 9.3 The Rate Distortion Coding Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 4 9.3.1 Example: Binary Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 9.3.2 Example: Gaussian ...
متن کامل